Long-term survival of duplicate genes despite absence of subfunctionalized expression
نویسندگان
چکیده
Gene duplication is a fundamental process in genome evolution. However, young duplicates are frequently degraded into pseudogenes by loss-of-function mutations. One standard model proposes that the main path for duplicate genes to avoid mutational destruction is by rapidly evolving subfunctionalized expression profiles. We examined this hypothesis using RNA-seq data from 46 human tissues. Surprisingly, we find that subor neofunctionalization of expression evolves very slowly, and is rare among duplications that arose within the placental mammals. Most mammalian duplicates are located in tandem and have highly correlated expression profiles, likely due to shared regulation, thus impeding subfunctionalization. Moreover, we also find that a large fraction of duplicate gene pairs exhibit a striking asymmetric pattern in which one gene has consistently higher expression. These asymmetrically expressed duplicates (AEDs) may persist for tens of millions of years, even though the lower-expressed copies tend to evolve under reduced selective constraint and are associated with fewer human diseases than their duplicate partners. We suggest that dosage-sharing of expression, rather than subfunctionalization, is more likely to be the initial factor enabling survival of duplicate gene pairs. 2 Main Text Gene duplications are a major source of new genes, and ultimately of new biological functions (1–3). Gene duplication is likely the primary mechanism by which new genes are born (4). However, the evolutionary forces governing the initial spread and persistence of young duplicates remain controversial (5). New duplicates are usually functionally redundant and thus susceptible to loss-of-function mutations that degrade one of the copies into a pseudogene. The average half-life of new duplicates has been estimated at just 4 million years (6). There has been a great deal of work to understand why many duplicate pairs do survive over long evolutionary timescales (5). These models generally assume that long-lived duplicates must evolve distinct functions to avoid mutational degradation, either by neofunctionalization (in which one copy gains new functions) or subfunctionalization (the copies divide the ancestral functions between them). One influential model known as Duplication-Degeneration-Complementation (DDC) proposes that complementary degeneration of regulatory elements may lead to the copies being expressed in different tissues, such that both copies are required to provide the overall expression of the ancestral gene (7). Similarly, neofunctionalization of expression could lead to one gene copy gaining function in a tissue where the parent gene was not expressed. Functional divergence may also occur at the protein level (8), but it is generally thought that divergence usually starts through changes in regulation (1, 9). Several empirical studies have measured functional redundancy of duplicates, but overall patterns and conclusions are inconsistent across organisms and approaches (10–16). One study of single and double knockouts of 3 yeast duplicates reported surprisingly high levels of apparent redunduncy even among old duplicate pairs (10). We therefore set out to test whether modern gene expression data from many tissues in human and mouse support the standard model of duplicate preservation by subfunctionalization of expression. Based on theoretical models and previous literature, we expected that–aside from the youngest duplicates–most duplicate pairs would be functionally distinct, and that the primary mechanism for this would be through divergent expression profiles. In particular, the suband neofunctionalization models suggest that, for each duplicate gene, there should be at least one tissue where that gene is more highly expressed than its partner. To test this prediction, we analyzed RNA-seq data from ten individuals for each of 46 diverse human tissues collected by the GTEx Project (17), and replicated our general conclusions using RNA-seq from 26 diverse mouse tissues (18). We first developed a computational pipeline for identifying duplicate gene pairs in the human genome (Supp. Inf. Section 2). After excluding annotated pseudogenes, we identified 1,444 high-confidence reciprocal best-hit duplicate gene pairs with >80% alignable coding sequence and >50% average sequence identity. We used synonymous divergence dS as a proxy for divergence time, while noting that divergence of gene pairs may be downwardly biased due to nonallelic homologous gene conversion in young duplicates (19). We estimate that dS for duplicates that arose at the time of the human-mouse split averages ∼0.4 and that most pairs with dS >∼0.7 predate the origin of the placental mammals (Figs. S3, S4). As expected, there is a peak of very young gene pairs dating to within the apes (dS < 0.1; Fig. 1A). This peak reflects the fact that only a small fraction of duplicates survive 4 long-term (6). Most of the 621 pairs with dS < 0.7 are physically close in the genome, likely because they were generated by segmental duplications. Older duplicates have often been separated by genomic rearrangements, although this is a very slow process. We estimate that 6% of the identified duplicates arose from retrotransposition (Fig. 1A; Supp. Inf. Section 5). dS N um be r of p ai rs 0 50 100 150 200 250 300 A. Classification of duplicates 0 0.5 1 1.5 2 Retrotransposition Different chromosomes Same chr. > 1mb Same chr. < 1mb dS 0 50 100 150 200 250 300 B. Expression patterns of duplicates 0 0.5 1 1.5 2 Unmappable No difference Asymmetrically expressed Sub−/neofunctionalized 1 Figure 1: Properties of duplicate gene pairs. A. Numbers of pairs for different values of dS , showing that most young pairs are nearby in the genome. B. Classification of gene pairs by expression patterns; note that sub-/neofunctionalization is rare except among older gene pairs. “Unmappable” indicates that RNA-seq reads could not be uniquely mapped to both genes. For context, note that duplicates arising at the human-mouse split would have dS ∼0.4. We next considered GTEx RNA-seq data from 46 tissues. Accurate measurement of expression in gene duplicates can be challenging because some RNA-seq reads may map equally well to both gene copies. There are also cases where reads from one gene copy map better than reads from the other copy, due to differential homology with other genomic locations. To overcome these challenges, we developed a new method specifically for 5 estimating the expression levels of duplicate genes (Supp. Inf. Section 3). In brief, we identified paralogous positions within each duplicate pair for which reads from both copies would map uniquely to the correct gene. Only these positions were used for estimating expression ratios. This approach is analogous to methods for measuring allele-specific expression (20). These strict criteria mean that some very young genes are excluded from our expression analyses as unmappable but, for the remaining genes, simulations show that our method yields highly accurate, unbiased estimates of expression ratios (Fig. S1). Figure 2: Expression of duplicate genes in representative tissues. A. A gene pair with an expression profile consistent with subor neofunctionalization: i.e., each gene is significantly more highly expressed than the other in at least one tissue. B. An asymmetrically expressed gene pair. Notice that expression of CBR1 exceeds expression CBR3 in all tissues. Introns have been shortened for display purposes. The Y-axis shows read depth per billion mapped reads. Green regions in the gene models are unmappable. 6 This new read mapping pipeline allowed us to classify gene pairs into categories based on their co-expression patterns (Supp. Inf. Sections 3, 6). We classified a gene pair as being potentially sub-/neofunctionalized if both genes have significantly higher expression (at least 2-fold difference and p<0.001) in at least one tissue each. Analysis of the RNA-seq data shows that surprisingly few duplicate pairs show any evidence of sub-/neofunctionalization of expression (Fig. 1B; example in Fig. 2A). Moreover, most gene pairs that do show such patterns are very old, dating to before the emergence of the placental mammals: for duplicates with dS < 0.7, just 10.7% of duplicates are classified as potentially sub-/neofunctionalized in expression. Given that even modest variation in expression profiles across tissues would meet our criteria for subfunctionalization, the fraction of truly subfunctionalized duplicates is probably even lower. Of course some additional duplicates might show evidence for subfunctionalization in a tissue not measured by GTEx. However it is unlikely that such cases would dramatically change the overall picture: We show below that genes with evidence for subfunctionalization in our data are significantly more conserved, and have a higher burden of gene-specific diseases compared to duplicates without evidence for subfunctionalization. This argues strongly that our data are not simply an artifact of misclassification. We also find very similar patterns in a mouse dataset with better representation of fetal tissues (18) (Fig. S11). We wondered whether gene pairs with higher tissue specificity might be more subfunctionalized (as they may have more tissue-specific enhancers) but this is not the case (Fig. S10). Finally, we hypothesized that subfunctionalization might instead occur through differential splicing of exons (21); however we found little evidence for this among mammalian duplicates (Fig. S13, Supp. Inf. Section 9). 7 Figure 3: Gene expression ratios for duplicate gene pairs. A. Heat maps of expression ratios for a representative sample of duplicate gene pairs, at 3 levels of synonymous divergence, dS . For each duplicate pair (plotted in columns) the ratios show the tissuespecific expression level of the gene with lower global expression relative to its duplicate. Blue indicates significantly lower expression of the minor gene in a particular tissue; red indicates significantly higher expression of the minor gene (p<.001 for both cases). Black indicates no significant difference. B. Distributions of expression ratios for duplicate gene pairs in representative tissues. Labeling same as in A. Notice that for most gene pairs, the minor gene has consistently lower expression than the major gene, with few clear cases of subfunctionalization (i.e., mix of red/blue) except for the most diverged gene pairs. Testis is often an outlier, enriched with subor neofunctionalized genes (16). 8 Surprisingly, many gene pairs instead show an unexpected asymmetric pattern of gene expression, in which one gene has consistently higher expression. For each pair, we classified the gene with higher overall expression as the “major” gene, and its partner as the “minor” gene. On average, minor genes are expressed at 40% of the level of major genes. Moreover, a large fraction of gene pairs show particularly strong asymmetry; see Fig. 2B and Fig. 3 for examples. We classified a gene pair as an asymmetrically expressed duplicate (AED) if the major gene was significantly more highly expressed (p<.001) in at least 1/3 of tissues where either gene is expressed, and not lower expressed than the minor gene in any tissue. The remaining duplicates were classified as no difference pairs, though many show weaker levels of asymmetry. Among duplicates that arose within the placental mammals, AEDs are much more common than potentially subfunctionalized pairs: 36.7% of duplicates with dS < 0.7, compared to just 10.7% of potentially sub-/neofunctionalized pairs. Most remaining gene pairs show little or no difference in expression within tissues (Fig. 1B; black in Fig. 3). To learn whether AEDs are a functionally meaningful category, we examined the numbers of known diseases (22) associated with different types of duplicate pairs (Fig. 4A). As might be expected, AED minor genes are significantly less likely than major genes to be associated with diseases (32% vs 46%; p=9×10−5). Across all duplicates, there is a strong effect that the lower the expression of the minor gene compared to the major gene, the lower the disease burden of the minor gene (p=4×10−10, controlling for dS; see Supp. Inf. Section 8 for details) (Table S2). In contrast, the extent of subfunctionalization is highly positively correlated with the number of gene-specific diseases (p=1×10−9) (Table S3). 9 AEDs are thus somewhat mysterious: why should a large class of duplicates with broadly reduced expression be maintained in the genome? Are these genes functionally constrained, or simply destined for mutational oblivion? We measured the strength of sequence conservation on major genes compared to minor genes of AEDs. Minor genes do show clear evidence of functional constraint: 97% of minor genes have dN/dS<1, which is a hallmark of protein-coding constraint (Fig. 4B). Nonetheless, minor genes evolve under relaxed selective constraint relative to major genes, both between species (Figs. 4B, S14) and within the human population, where minor genes have a higher rate of common missense and nonsense variants compared to major genes (Figs. 4C, S16, S17). An alternative hypothesis for the preservation of duplicates is that they can become non-redundant by sharing the required dosage of gene expression (25). This model suggests that duplicates should rapidly evolve reduced expression, such that the summed expression of the duplicates is close to that of the parent gene. Subsequently, loss of either gene is deleterious because it leads to a deficit of expression. To evaluate this, we analyzed the expression of human duplicates that arose since the human-macaque split using RNA-seq data from 6 tissues in human and macaque (24) (Fig. 4D, Supp. Inf. Section 7). Indeed, there is a very clear signal that both copies tend to evolve reduced expression, such that the median summed expression of the human duplicates is very close to the expression of the singleton orthologs in macaque (median expression ratio 1.11; this is significantly less than the 2:1 expression ratio expected based on copy number, p=7.6×10−6). Thus, our data suggest a model of duplicate preservation by dosage sharing, rather than subfunctionalization. Finally, to understand why subfunctionalization evolves so slowly, we explored the 10 −7 −5 −3 −1 1 0 2 Log2(minor/major expression ratio) Mean number of minor gene associated diseases p = 6e−10 A. Associated diseases −Inf −7 −5 −3 −1 0 2 Log2(proportion of tissues minor gene expressed high) Mean number of minor gene specific diseases p = 8e−11 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.5 1.0 1.5 2.0 d N d S B. dN dS (AEDs) Major genes Minor genes 0−0.1 0.1−0.5 0.5−1.0 1.0−1.5 1.5−2.0 dS * *** n.s * n.s 1e−04 1e−03 1e−02 1e−01 1e+00 C um ul at iv e pr op or tio n Nonsynonymous C. Allele frequency distn. (AEDs) Derived allele frequency 0 0.4 0.6 0.8 1 Synonymous Nonsense Major genes Minor genes −4 −2 0 2 4 Lo g2 e xp re ss io n ra tio Ratio=2:1 Ratio=1:1 Sum Major Minor Singletons D. Expression vs macaque
منابع مشابه
I-16: Assessment of The Vitrified Ovarian Tissue in Long Term Culture
In vitro culture of human ovarian tissue the following cryopreservation is proposed for follicular development. There are no techniques that guarantee successful maturation of the follicles within the excised tissue. The viability of cultured human ovarian tissue improved by adding some growth factors to the culture media. The efficiency of vitrification as the cryopreservation method for human...
متن کاملO-4: Morphological Analyses and Apoptosis Genes Expression Evaluation in Vitrified Human Ovarian Tissue after Warming, Long Term Culturing and Xenotransplantation
Background In vitro culturing and retransplantion of vitrified- warmed ovarian tissue are two ways to restore fertility after radiation or chemotherapy.This study aimed to evaluate the incidence of apoptosis in vitrified human ovarian tissue after warming, long term culturing and xenotransplantation by morphological analyses and apoptosis genes expression evaluation. MaterialsAndMethods We obta...
متن کاملSURVIVAL IN PATIENTS WITH MALIGNANT GLIOMAS OF THE BRAIN
The present retrospective study was designed to analyze factors with prognostic values a) within, and b) significantly associated with, short-term (12months or less) and long-term (more than 24 months) survival times, i n 72 consecutive patients treated for malignant gliomas. Among 41 (57%) short-term surviving patients, the absence of both aphasia and motor deficit (as initial presenting ...
متن کاملEffect of long-term oral administration of extra thyroxine on oviductal expression of carbonic anhydrase and avidin-related protein-2 genes in broiler breeder hens
Avian sperm are stored in the sperm storage tubules (SSTs) of the hen oviduct for a prolonged period. The impact of avidin-related protein-2 (AVRP2) and carbonic anhydrase II (CA II) in sperm viability in the SSTs has been suggested. The aim of the present study was to investigate the effect of oral administration of a high dose of thyroxine on the oviductal expression of AVRP2<...
متن کاملTwo Lung Cancer Development-Related Genes, Forkhead Box M1 (FOXM1) and Apolipoprotein E (APOE), are overexpressed in Bronchial of Patients after Long-Term Exposure to Sulfur Mustard
Sulfur mustard (SM) is a strong alkylating and mutagenic compound that targets humanairway system. We considered the expression of Forkhead box M1 (FOXM1) and apolipoproteinE (APOE) genes, which are responsible for cell proliferation, differentiation, tumorigenesis,and increased risk of lung cancer, in the lung bronchial tissue of patients exposed to SM.After performing pulmonary functional tes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015